Abstract of " Load Management Techniques for Distributed Stream Processing " Load Management Techniques for Distributed Stream Processing Date U˘ Gur C ¸ Etintemel, Reader
نویسندگان
چکیده
of “Load Management Techniques for Distributed Stream Processing” by Ying Xing, Ph.D., Brown University, May 2006. Distributed and parallel computing environments are becoming inexpensive and commonplace. The availability of large numbers of CPU’s makes it possible to process more data at higher speeds. Stream-processing systems are becoming more important, as broad classes of applications require results in real-time. Since load can vary in unpredictable ways, exploiting the abundant processor cycles requires effective load management techniques. Although load distribution has been extensively studied for the traditional pull-based systems, it has not yet been fully studied in the context of push-based continuous query processing. Push-based data streams commonly exhibit significant variations in their rates and burstiness over all time-scales. Even though medium-to-long term load variations can be dealt with using dynamic load migration techniques rather effectively, short-term variations are much more problematic since capturing such variations and reactively migrating load can be prohibitively expensive or even impractical. We need robust solutions whose performances do not degrade significantly in the presence of time-varying load. In this dissertation, we present both a static operator distribution technique and a dynamic operator re-distribution technique for a cluster of stream processing engines. First, for operators that are impractical to be moved between processing engines on the fly, we propose a resilient static operator distribution algorithm that aims at avoiding overload by maximizing the feasible space of the operator distribution plan. Second, for operators with relatively small load migration overheads, we propose a correlation-based dynamic operator distribution algorithm that aims at minimizing end-to-end latency by minimizing load variance and maximizing load correlation. Our experiment results quantify the effectiveness of the proposed approaches and demonstrate that they significantly outperform traditional operator distribution approaches. Load Management Techniques for Distributed Stream Processing by Ying Xing B. Eng., Automation Engineering, Tsinghua University, 1997 Sc. M., Computer Science, Tsinghua University, 1999 Sc. M., Computer Science, Brown University, 2001 Sc. M., Applied Mathematics, Brown University, 2004 A dissertation submitted in partial fulfillment of the requirements for the Degree of Doctor of Philosophy in the Department of Computer Science at Brown University Providence, Rhode Island May 2006 c © Copyright 2006 by Ying Xing This dissertation by Ying Xing is accepted in its present form by the Department of Computer Science as satisfying the dissertation requirement for the degree of Doctor of Philosophy. Date Stan Zdonik, Director Recommended to the Graduate Council Date Uğur Çetintemel, Reader Date John Jannotti, Reader Approved by the Graduate Council Date Sheila Bonde Dean of the Graduate School
منابع مشابه
Contract-Based Load Management in Federated Distributed Systems
This paper focuses on load management in looselycoupled federated distributed systems. We present a distributed mechanism for moving load between autonomous participants using bilateral contracts that are negotiated offline and that set bounded prices for moving load. We show that our mechanism has good incentive properties, efficiently redistributes excess load, and has a low overhead in pract...
متن کاملStaying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing
In distributed stream processing environments, large numbers of continuous queries are distributed onto multiple servers. When one or more of these servers become overloaded due to bursty data arrival, excessive load needs to be shed in order to preserve low latency for the query results. Because of the load dependencies among the servers, load shedding decisions on these servers must be well-c...
متن کاملFault-tolerance and load management in a distributed stream processing system
Advances in monitoring technology (e.g., sensors) and an increased demand for online information processing have given rise to a new class of applications that require continuous, lowlatency processing of large-volume data streams. These “stream processing applications” arise in many areas such as sensor-based environment monitoring, financial services, network monitoring, and military applicat...
متن کاملLoad Management and High Availability in the Borealis Distributed Stream Processing Engine
Borealis is a distributed stream processing engine that has been developed at Brandeis University, Brown University, and MIT. It extends the first generation of data stream processing systems with advanced capabilities such as distributed operation, scalability with timevarying load, high availability against failures, and dynamic data and query modifications. In this paper, we focus on aspects...
متن کاملDistributed Data Streams
DEFINITION A majority of today’s data is constantly evolving and fundamentally distributed in nature. Data for almost any large-scale data-management task is continuously collected over a wide area, and at a much greater rate than ever before. Compared to traditional, centralized stream processing, querying such large-scale, evolving data collections poses new challenges, due mainly to the phys...
متن کامل